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Abstract 

Chain records is a new type of multidimensional record. We discuss how often the chain records 
are broken when the background sampling is from the unit cube with uniform distribution (or, more 
generally, from an arbitrary continuous product distribution). 
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1 Introduction 

Consider independent marks X\,X2, ■ ■ ■ sampled from the uniform distribution in Qd = 
[0, l] d . We define a mark X n to be a chain record if X n beats the last chain record in 
Xi, . . . , X„_i. More precisely, record values and record indices are introduced recursively, 
by setting T\ = 1, i?i = X\ and 

T k = min{n > T fe _x : X n -< R k -i} , Rk = X Tk , k > 1 . 

Here, -< denotes the standard strict partial order on M. d defined in terms of component- wise 
orders by 

(1) Ud)\ , _ /,.(!) 



X = [X y X 



-<y=(y w ,...,yW) iff x ^ y and x® < y® for i = l,...,d. 



It is easy to see that, in any dimension d, the terms of (Tk) are indeed well defined for all 
k, that is the chain records occur infinitely many times. 

Although the definition is an obvious restatement of the classical definition of lower 
record, this notion of multidimensional record has not been explored so far. The chain 
records interpolate between two other types of multidimensional records which have been 
studied in some depth |I||2|H||]31II3IEIIIHIII2II2DIEIIE21- We say that a strong record 
occurs at index n if either n = 1, or n > 1 and 

X n -< Xj for j — 1, . . . , n — 1. 

In the terminology of partially ordered sets, a strong record X n is the least element in 
the point set {X\, . . . ,X n }. Since repetitions in each component have probability zero, 
X n is a strong record if and only if there are d marginal strict lower records at index n 
simultaneously. We say that a weak record occurs at index n if either n = 1, or n > 1 and 

Xj -fi X n for j = 1, . . . , n - 1. 



'Postal address: Department of Mathematics, Utrecht University, Postbus 80010, 3508 TA Utrecht, The Netherlands. 
E-mail address: gnedin@math.uu.nl 



1 



JO 



11 



Figure 1: Chain records in the square 

A weak record X n is a minimal element in the set {X±, . . . , X n }. Obviously, each strong 
record is a chain record. Also, each chain record is a weak record, as follows easily by 
induction from transitivilty of the relation -<. To illustrate, for the two-dimensional con- 
figuration of points in Figure 1 the weak records occur at times 1, 2, 3, 5, 6, 7, 8, 9, a sole 
strong record occurs at 1, the marginal records occur at 1,2,3,5,6, and the chain records 
occur at indices 1, 5, 8. Notably, the chain records are more sensible to arrangement of 
marks in sequence: a permutation of X\, . . . , X n _i may destroy or create a chain record 
at index n. 

Denote N_ n ,N n and N n , respectively, the counts of strong, weak and chain records 
among the first n marks. Thus 

N n <N n <N n . 

To underscore concretely the extent of compromise between weak and strong records, we 
need some estimates of how often the records of different kinds may occur. 

Recall that in the case d = 1 the occurences of records are independent, with probability 
1/n for index n; this basic fact (known as the Dwass-Renyi lemma PUEH!) implies that 
the number of classical records is asymptotically Gaussian with both mean and variance 
about logn. This translates easily to the marginal records in d dimensions, since the 
marginal rankings are independent. The latter kind of independence is characteristic for 
sampling from product distributions in M. d with continuous marginals, hence the instance 
of Qd with uniform distribution covers the general product case. 

Properties of the strong-record counts for sampling from are also rather simple. 
By independence of marginal rankings we have a representation N_ n — Ii + . . . + I n with 
independent Bernoulli indicators and p n := P(/ n = 1) = n d ■ Thus 

n 

3=1 J 

Since for d > 1 the series ^2 p converges, the total number of strong records in the infinite 
sequence of marks is almost surely finite. 

Counting the weak records is a more delicate matter since their occurences are not 
independent. However, we may exploit a correspondence between weak records in 
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and the minimal elements in Qd+i (depending on the context these points are also called 
Pareto, admissible, efficient, etc.). The correspondence is established by arranging the 
marks in d + 1 dimensions by increase in one fixed component. By induction in d one can 
show that 

EiV n = J2 -^-~l(logn) d , 
7i " • " Id d\ 

see . From further known results (see [2] and references therein) follows that the variance 
Var [N n ] is of the same order (logn) d , and that N n is asymptotically Gaussian. 

Thus the strong records are much more rare and the weak records are much more 
frequent than the classical records. In this note we show that, as far as the frequency 
is concerned, the chain records in any dimension d are more in line with the classical 
records: 

Proposition 1. For sampling from Qd with uniform distribution the number of chain 
records N n is approximately Gaussian with moments 

E [N n ] ~ cT 1 log n , Var [N n ] ~ cT 2 log n . 

The CLT will be proved in Section 3. Above that, we will derive exact and asymptotic 
formulas for the probability of a chain record and discuss some scaling limits. 

The chain records comprise a 'greedy' chain in -<, meaning that a mark is joined each 
time the chain constraint is not violated. More efficient nonanticipating algorithms for 
constructing long chains were designed in [3| , and the length of the longest possible chain 
on n random marks was estimated in From yet another perspective, the sequence of 
chain records corresponds to a particular path in a random data structure called quad-tree 

dang. 



2 The heights at records 

For x & Qd the quadrant L x := {y G Qd '■ y -< x} is the lower section of the partial 
order at x. The height h(x) is the product of coordinates, which in the case of uniform 
distribution under focus is equal to the value of the multidimensional distribution function 
at x, i.e. the measure of L x . The height is a key quantity to look at, because the heights 
at chain records determine the sojourns. Let H^ = h(Rk). 

Lemma 2. Given (H}~) the sojourns T^ + i — T& are conditionally independent, geometric 
with parameters H^, k = 1, 2, . . . 

Proof. A new chain record Rk+i occurs as soon as Lf> k is hit by some mark. □ 

The lemma has the following elementary but important consequence. 

Corollary 3. Given (H*.), the conditional law for occurencies of the chain records for any 
d is the same as in the classical case d = 1. 

The heights at records undergo a multiplicative renewal process, sometimes called 
stick-breaking. Let W, W\, W^, W3, ... be i.i.d. copies of Hi = h(Xi). 

Lemma 4. The heights (H^) have the same law as the sequence of products (Wi ■ ■ ■ Wk, k = 
1,2,...). 
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Proof. Each lower section L x , viewed as a partially ordered probability space with nor- 
malised Lebesgue measure is isomorphic to Qd (via a coordinate-wise scale transforma- 
tion). Hence all ratios Hk+i/Hk are i.i.d., with the same law as Hi. □ 



Explicitly, the density of W is 



F(W e ds) 



(logs)^ 1 

(d-iy. 



ds, s G [0, 1] 



(1) 



and its Mellin transform is 



0(A): 



E [W ] = (A + 1) 



-d 



as follows by noting that Hi is the product of d independent uniform variables. 

The distinction with the classical d — 1 case is seen already at this early stage of 
our discussion. In the classical case Hi has uniform distribution, hence the stick-breaking 
sequence (Wi • • • Wk, k = 1, 2, . . .) is the sequence of points of a self-similar (i.e. invariant 
under homotheties) Poisson process with intensity ds/s, s£ [0, 1]. For d > 1 the point 
process (Hk) is neither Poisson nor self-similar, which is a major source of difficulties 
leading, e.g., to dependencies in the occurences of chain records at distinct n. 



Corollary |3] suggests to focus on properties of a univariate sequence of random variables 
modified by conditioning and then mixing over some given distribution for its subsequence 
of record values. 

Let (Uj) be a sequence of [0, 1] uniform points, independent of (Hk). We shall produce 
a transformed sequence (Uj) \ (Hk) by replacing some of the terms in (Uj) by the HkS. 
Replace U\ by Hi. Do not alter U 2 , U 3 , . . . as long as they do not hit [0, Hi[ ; then replace 
the first uniform point hitting the interval [0, Hi[ by H 2 . Inductively, as Hi, . . . , Hk got 
inserted, keep on screening uniforms until first hitting [0,Hk[, then insert Hk+i in place 
of the uniform point that caused the hit, and so on. Eventually all H^s will enter the 
resulting sequence. It is easy to see that given (Hk) the distribution of (Uj) | (Hk) is the 
same as the conditional distribution of (Uj) given the subsequence of record values (Hk). 
In the classical case, (Hk) is the stick-breaking sequence with uniform factors, and we 

have (Uj) \ (Hk) = (Uj), so the insertion does not alter the law of the sequence. 

By Corollary El N n can be identified with the number of points among Ui, . . . ,U n that 
get replaced by some H k 's. 

There is yet another related interpretation in terms of partially exchangeable partitions, 
as introduced in [23] • The unit interval ]0, 1[ is divided by (Hk) in infinitely many disjoint 
subintervals [Hi,H [, [H 2 ,Hi[, . . . (where H = 0). A random partition II of the set N 
into disjoint nonempty blocks is defined by assigning two generic integers m and n to the 
same block if and only if the mth and the nth terms of (Uj) \ (Hk) hit the same subinterval. 
The same partition II can be defined directly in terms of (X n ), by decomposing Qd in 
disjoint layers Q \ Lr 1} Lr x \ Lr 2 , . . .. Clearly, Ti, T 2 , . . . are the minimal integers in the 
blocks of II, and N n is the number of blocks represented on the first n integers. 

The construction of (Uj) \ (Hk) does not impose any constraints on the law of the 
sequence (Hk), which can be an arbitrary nonincreasing sequence (the induced II is then 



3 Proving the CLT 
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the most general partially exchangeable partition |2Sj)- With this in mind, we shall 
take for a while a more general approach and assume (as in 17 ) that H k = W± ■ ■ ■ W k , 
k = 1, 2, . . . where Wi, W 2 , ■ ■ ■ are independent copies of a random variable W G [0, 1] 
with finite logarithmic moments 

fi = E[-logW], a 2 = Var [-\ogW]. 

Proposition 5. For n — > oo ; the variable N n is asymptotically Gaussian with moments 

1 2 
E [N n ] ~ - log n , Var [N n ] ~ % log n . 

VFe a/so /iave £/ie strong law 

N n ~ — logn a.s. 
/i 

Proof. Our strategy is to show that iV n is close to := max{/c : H k > 1/n}. By the 
renewal theorem pi] is asymptotically Gaussian with the mean /i _1 logn and the 
variance a 2 fi~ 3 logn because K n is just the number of epochs on [0, logn] of the renewal 
process with steps — logWj. 

By the construction of (Uj) \ (Hk), we have a dichotomy: U n G ]Hk, Hk-i] implies that 
either U n will enter the transformed sequence or will get replaced by some Hi > Hk- Let 
U n \ < . . . < U nn be the order statistics of U\, . . . , U n . It follows that 

(i) if U n j > H k then N n <k + j, 

(ii) if U nk < H k then N n > k. 

Let ^ n be the number of uniform order statistics smaller than 1/n. By definition, Hx n +\ < 
1/n < H Kn , hence K n and £ ra are independent and £ n is binomial(n, 1/n). By (i), we have 
N n < K n +£ n where £ n is approximately Poisson(l), which yields the desired upper bound. 

Now consider the threshold s n = (logn) 2 /n and let J n := max{/c : H k > s n }. By (ii), 
if the number of order statistics smaller than s n is at least J n then N n > J n . Because 
log n ~ log n — 2 log log n the index J n is still asymptotically Gaussian with the same 
moments as K n . On the other hand, the number of order statistics smaller than s n is 
asymptotically Gaussian with moments about (logn) 2 . Hence elementary large deviation 
bounds imply that N n > J n with probability very close to one. This yields a suitable 
lower bound, hence the CLT. Along the same lines, the strong law of large numbers follows 
from N n ~ K n . □ 

Similar limit theorems have been proved by other methods for the number of blocks of 
exchangeable partition in [T7], and for a random (size-biased) path in a quad-tree [TU] . 

Proposition ^ follows as an instance of Proposition by computing the logarithmic 
moments as 

w = E [— log W] = -g'(0) =d, a 2 = Var [- log W] = g"(0) - g'(0) 2 = d . 
4 Poisson-paced records 

The probability p n of a chain record at index n is equal to the mean height of the last chain 
record before n. Asymptotics for these quantities follow most easily by poissonisation. 
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Let (r„) be the increasing sequence of points of a homogeneous Poisson point process 
(PPP) on R + , independent of the marks (X n ). The sequence ((X n ,r n ), n = 1, 2, . . .) is 
then the sequence of points of a homogeneous PPP in x M + in the order of increase of 
the time component, which now assumes values in the continuous range M.+. Let N t be 
the number of chain records and B t the height of the last chain record on [0, t], that is 

N t = max{k : r Tfc < t} , B t = H^. 
Clearly, (B t ) is the predictable compensator for (N t ), in particular 
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Proposition translates literally as a CLT for N t as t — > oo. 

The process (B t ) is Markov time-homogeneous with a very simple type of behaviour. 
Given B t = b the process remains in state b for some rate-6 exponential time and then 
jumps to a new state bW, with W a stereotypical copy of H\. Immediate from this 
description is the following self-similarity property: the law of (B t ) with initial state 
B = b is the same as the law of the process (bBbt) with B = 1. This kind of process is 
well defined for arbitrary initial state b > 0. See [TE] for features of this process related to 
the classical records and [7j for more general self-similar (also called semi-stable) processes 
related to increasing Levy processes. The process (B t ) with B = b is naturally associated 
with the chain records defined in terms of a homogeneous PPP in bQd x K + , with bQd 
being the cube with side [0, b]. 

By the self-similarity of (B t ) the moments 

m p (t) :=E[Sf] 

satisfy a renewal-type equation 

m > p (t) = -m p (t) + E [W p mp{tW)\ . 
The series solution to this equation with the initial value m^O) = 1 is 

00 (—f) k 1 
m ^ = E -TT II( 1 - 9(J + /?)) with ff (A) = — — - d , 
fc=0 ' j=o \ ' 

as one can check by direct substitution (see e.g. jBj). 

Since mi(t) is the probability that the first arrival after t is a chain record, we have 
the poissonisation identity 

°° t n 

mi (t) = e^'V -rPn+l , 

z — ' n! 

n=0 

which implies, upon equating coefficients of the series, 

n— 1 /• 1 \ fe— 1 

fc=o ^ ' i=o 
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This can be compared with the analogous formulas 



n_1 / — 1 \ 

P n = g{n - 1), Pn = J2[ k Y~ l ) k 9^) 

k=0 ^ ' 

for the occurencies of strong and weak records, so it would be nice to have a direct 
combinatorial argument for (J2J. 

For d = 1 we obtain from (J2J) the familiar p n = 1/n, and for d = 2 we obtain (surpris- 
ingly simple) p n = l/(2n) (for n > 1). For d > 2 the formulas for p n do not simplify. 

Factoring 

, cm A L+jLhlz e2 " r/ '' 

r=l J ' 

we see that the series for mp is a generalised hypergeometric function of the type d,Fd- Ex- 
ploiting the asymptotic properties of this class of functions, we determine the asymptotics 
as 

lim mp(t)tP = TT — ^— - = ^1— TT (3) 

t-oo ^ V ; -g'(0) 1 = 1 1 - #(r) 0d =i r 1 

where /3 = 1,2,... Full asymptotic expansion is obtainable in a similar way, see [H] for 
details of the method and references. The depoissonisation of the (3 = 1 instance implies, 
quite expectedly, 

1 

p n ~ — — , as n->oo. 
dn 

The following asymptotics for B t is also derived from (J3J) by application of the method 
of moments. 

Proposition 6. The random variable tB t converges, as t — > oo ; in distribution and with 
all moments to a random variable Y whose moments are given by 

^ r=2 



The law of Y, determined uniquely by the moments (jlj). may be considered as a kind 

of extreme-value distribution. In the case d = 1 we recover well-known Y = E with 
£7 standard exponential, and for d = 2 we get F = EU with and U independent 
exponential and uniform random variables. In general, there is a series representation 

oo k 

Y = E W G + Y, E k\{W 3 

k=l j=0 

where E^s are exponential, W/s for j > are as before, Wq has density 

F{W G ds) = P( ^~ g) ds , s G [0, 1] (5) 

(which is density of the stationary distribution for the stick-breaking with factor W) and 
all variables are independent. Also, Y may be interpreted as an exponential functional 
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of a stationary compound Poisson process with initial state — log Wo and a generic jump 
— logW, see |S]. In the discrete-time setting, the same limit law applies to the height of 
the last chain record before n. 

5 Scaling limits 

Let b > be a scaling parameter which we will send to oo. In the case of one dimension 
the point process {R k } of record values is a self-similar PPP on M + with intensity dx/x 
(restricted to a; 6 [0,1])- The same limit appears also for the point process of record 
times {Tk/b}. The bivariate point process {(bR k , T k /b), k = 1, 2, . . .} has a joint scaling 
limit which may be identified with the set of minimal points (the Pareto boundary) of the 
homogeneous PPP in See (2HI2S] for these classical results. 

These facts can be generalised to chain records in d > 1 dimensions. Observe that for 
the values of chain records we have the component-wise representation 

Rf^U?---U^\ j = l,...,d; fc = l,2,... 

with independent uniform s. Therefore, each marginal process {bR^\ k = 1, 2, . . .} 
converges to the same self-similar PPP on M + . The vector point process {bRk} converges, 
as b — > oo, to a degenerate limit in which lives on the union of the coordinate axis (this 

follows because any level c/b is surpassed by one of the marginal {R^ f }'s considerably 
before the others). More interestingly, there is a planar limit for the joint process of 
heights and record times. 

Proposition 7. The scaled point process {(bH k ,Tk/b), k = 1,2,...} has a weak limit 
as b — > oo. The limiting point process TZ in is invariant under hyperbolic shifts 
(s,t) i — > (bs,t/b) (with b > 0), and the coordinate projections of TZ are self- similar point 
processes. 

Proof. The existence of the limit follows from the analogous result for Poisson-paced 
marks, and in the latter setup the result folows from fUJ Theorem 1] which, adapted 
in our framework, guarantees existence of the entrance law from oo for the process (B t ) 
started at Bq = b, as b — > oo. The hyperbolic invariance follows from self-similarity of 
(Bt). □ 

A more explicit construction of TZ is the following. Let Ti be the multiplicatively 
stationary (that is, self-similar) multiplicative renewal process with a generic factor W. 
We may view Ti as an extension to M + from [0, 1] of the stick-breaking point process 

{Wq, WbWi, W0W1W2, . . .} where W k — W and W has the stationary density ©. Let 
k G Z} be the points of Ti. which may be labelled so that £0 = Wq is the maximum 
point of TC fl [0, 1], and £_i > 1. Assign to each £ k an arrival time a k '■= X/£=-oo ^i/d 
where the E^s are independent standard exponential variables, also independent of Ti,. 
Then let TZ := {(£&, o^), k G Z}. The hyperbolic invariance of TZ is obvious from the 
construction and self-similarity of Ti,. 

The limit process of heights Ti is not Poisson, since the law of W is not beta(#, 1) (for 
some 9 > 0). For a similar reason, the limit process of record times, which is the time- 
projection of TZ, is also diferent from a Poisson process. In the discrete-time setting, the 
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dependence of occurencies of chain records follows from our interpretaion of chain records 
in terms of partition II and a characterisation of the Ewens partitions in j22| (where it is 
shown that the independence would force W to be beta(#, 1)). 

As noticed by Charles Goldie, the component-wise logarithmic transform 

(-log(i# ) ),...,-log(i^ A; = 1,2,... 

sends the chain records in Qd to the sequence of sites visited by a <i-dimensional random 
walk whose components are independent one-dimensional random walks with exponen- 
tially distributed increments. Equivalently, one can consider the upper chain records from 
the product exponential distribution in d dimensions. In this regime, subject to a suitable 
normalisation, the values of chain records concentrate near the diagonal of the positive 
orthant. 
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